Plot Model Projections Output

The hubVis package contains a function called plot_step_ahead_model_output() that can be used to plot model output that is in the format of forecasts or projects that look multiple horizons into the future.

This function plots forecasts/scenario projections and optional truth data. Faceted plots can be created for multiple scenarios, locations, forecast dates, models, etc. Currently, the function can plot only quantile data, with the possibility to add “median” information from the model projections.

For more information about the Hubverse standard format, please refer to the HubDocs website.

The following vignette describes the principal usage of the plot_step_ahead_model_output() function.

library(hubVis)
library(hubUtils)

Plots are available in two output formats:

Load and Filter Data

The package contains two datasets that will be used for the following examples:

Load data

projection_path <- system.file("example_round1.csv", package = "hubVis")
projection_data <- read.csv(projection_path, stringsAsFactors = FALSE)
projection_data <- as_model_out_tbl(projection_data)
head(projection_data)
#> # A tibble: 6 × 9
#>   model_id        origin_date scenario_id  location target   horizon output_type
#>   <chr>           <chr>       <chr>        <chr>    <chr>      <int> <chr>      
#> 1 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 2 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 3 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 4 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 5 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 6 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> # ℹ 2 more variables: output_type_id <dbl>, value <dbl>

truth_path <- system.file("truth_data.csv", package = "hubVis")
truth_data <- read.csv(truth_path, stringsAsFactors = FALSE)
head(truth_data)
#>     time_idx location value   target
#> 1 2020-01-25       02     0 inc case
#> 2 2020-01-25       01     0 inc case
#> 3 2020-01-25       05     0 inc case
#> 4 2020-01-25       04     0 inc case
#> 5 2020-01-25       06     0 inc case
#> 6 2020-01-25       08     0 inc case

Data Preparation

The model output data in the projection_data object follows the structure of the model_out_tbl class. This dataset is converted to a model_out_tbl object after being read-in above. In addition to the standard requirements for this class, the plot_step_ahead_model_output() function in hubVis requires that the dataset have a column whose value corresponds to the variable that should be used for the x-axis of a “step ahead” plot. In general, this should be a date variable that corresponds to the date which is the “target” of a particular prediction. By default it will look for the "target_date" column, although this could be over-ridden by specifying a different column using the x_col_name argument. In our example data, this column does not exist, so we add it below:

projection_data <- dplyr::mutate(
  projection_data, target_date = as.Date(origin_date) + (horizon * 7) - 1)
head(projection_data)
#> # A tibble: 6 × 10
#>   model_id        origin_date scenario_id  location target   horizon output_type
#>   <chr>           <chr>       <chr>        <chr>    <chr>      <int> <chr>      
#> 1 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 2 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 3 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 4 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 5 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> 6 HUBuni-simexamp 2021-03-07  A-2021-03-05 US       inc case       1 quantile   
#> # ℹ 3 more variables: output_type_id <dbl>, value <dbl>, target_date <date>

Plot

The plotting function requires only 2 parameters:

“Simple” plot

The projection_data and truth_data contain information for multiple locations, and scenarios.

To plot the model projections for the US, Scenario A:

# Pre-filtering
projection_data_A_us <- dplyr::filter(projection_data, 
                                      scenario_id == "A-2021-03-05", 
                                      location == "US")

# Limit time_idx for layout reason
truth_data_us <- dplyr::filter(truth_data, location == "US", 
                               time_idx < min(projection_data$target_date) + 21,
                               time_idx > "2020-10-01")
plot_step_ahead_model_output(projection_data_A_us, truth_data_us)

By default, the 50%, 80% and 95% intervals are plotted, with a specific color palette per model_id.

In general, it is hard to see multiple intervals when multiple models are plotted, so specifying only one interval can be useful:

plot_step_ahead_model_output(projection_data_A_us, truth_data_us, 
                             intervals = 0.8)

It is also possible to add a median line on the plot with the use_median_as_point parameter:

plot_step_ahead_model_output(projection_data_A_us, truth_data_us, 
                             intervals = 0.8,
                             use_median_as_point = TRUE)

By default plots are interactive, but that can be easily switched to static:

plot_step_ahead_model_output(projection_data_A_us, truth_data_us, 
                             intervals = 0.8,
                             use_median_as_point = TRUE, 
                             interactive = FALSE)

Facet plot

A “facet” (or subplot) plot can also be created for each scenario

# Pre-filtering
projection_data_us <- dplyr::filter(projection_data,
                                    location == "US")
# Limit time_idx for layout reason
truth_data_us <- dplyr::filter(truth_data, location == "US", 
                               time_idx < min(projection_data$target_date) + 21,
                               time_idx > "2020-10-01")
plot_step_ahead_model_output(projection_data_us, truth_data_us, 
                             facet = "scenario_id")

The layout of the “facets” can be adjusted, with the different facet_ parameters.

plot_step_ahead_model_output(projection_data_us, truth_data_us, 
                             use_median_as_point = TRUE,
                             facet = "scenario_id", facet_scales = "free_x", 
                             facet_nrow = 2, facet_title = "bottom left")

Or with the additional facet_ncol parameter for the statics plot

plot_step_ahead_model_output(projection_data_us, truth_data_us, 
                             use_median_as_point = TRUE, interactive = FALSE,
                             facet = "scenario_id", facet_scales = "free_x", 
                             facet_ncol = 4, facet_title = "bottom left"
                             )

A “facet” (or subplot) plot can also be created for each model. In this case, the legend will be adapted to return the model_id value.

plot_step_ahead_model_output(projection_data_A_us, truth_data_us, 
                             facet = "model_id")

The legend can be removed with the parameter show_legend = FALSE.

plot_step_ahead_model_output(projection_data_A_us, truth_data_us, 
                             facet = "model_id", show_legend = FALSE)

Intervals

By default, the 50%, 80% and 95% intervals are plotted. However, it is possible to also plot the 90% intervals or a subset of these intervals. When plotting 6 or more models, the plot will be reduced to show the widest intervals provided (95% by default).

To illustrate this we will use the projections for only one model

# Pre-filtering
projection_data_mod <- dplyr::filter(projection_data,
                                     location == "US", 
                                     model_id == "hub-ensemble")
plot_step_ahead_model_output(projection_data_mod, truth_data_us, 
                             use_median_as_point = TRUE, facet = "scenario_id", 
                             facet_nrow = 2, intervals = c(0.5, 0.8, 0.9, 0.95))

The opacity of the intervals can be adjusted:

plot_step_ahead_model_output(projection_data_mod, truth_data_us, 
                             use_median_as_point = TRUE, facet = "scenario_id", 
                             facet_nrow = 2, intervals = c(0.5, 0.8, 0.9, 0.95),
                             fill_transparency = 0.15)

Plots without intervals are also possible:

plot_step_ahead_model_output(projection_data_mod, truth_data_us, 
                             use_median_as_point = TRUE, facet = "scenario_id", 
                             facet_nrow = 2, intervals = NULL)

Other parameters

Several other parameters are available to update the plot output. Here is some examples of some parameters.

“Ensemble” layout

It is possible to assign a specific color and behavior to a specific model_id. Typically, this is done to highlight an ensemble, so the name for these arguments are ens_name and end_color. The model specified by ens_name will be the top layer of the resulting plot.

plot_step_ahead_model_output(projection_data_us, truth_data_us,
                             use_median_as_point = TRUE, 
                             facet = "scenario_id", facet_nrow = 2,
                             ens_name = "hub-ensemble", ens_color = "black",
                             intervals = 0.8)

Layout update

Multiple layout update are possible:

  • Not showing the truth data in the plot:
plot_step_ahead_model_output(projection_data_A_us, truth_data_us, 
                             plot_truth = FALSE)
  • Change the top layer to the truth data:
plot_step_ahead_model_output(projection_data_A_us, truth_data_us, 
                             top_layer = "truth")
  • Add a title to the plot:
plot_step_ahead_model_output(projection_data_A_us, truth_data_us, 
                             title = "Incident Cases in the US")
  • Change palette color and behavior:

    • The default palette can be changed. All the available palette names are available here:
    RColorBrewer::display.brewer.all()

plot_step_ahead_model_output(projection_data_A_us, truth_data_us, 
                             pal_color = "Dark2")
  • By default, separate colors will be used for each model.

The fill_by parameter can be change to another valid column names to change the legend and colors attributes to this new column.

plot_step_ahead_model_output(projection_data_us, truth_data_us, 
                             facet = "model_id", fill_by = "scenario_id")

It is possible to use only blues for all models, by setting the pal_color parameter to NULL. This might be especially useful when used for many models in conjunction with highlighting the ensemble forecast using the ens_name and ens_color argument.

plot_step_ahead_model_output(projection_data_A_us, truth_data_us, 
                             intervals = 0.8,
                             ens_name = "hub-ensemble", ens_color = "black",
                             pal_color = NULL, use_median_as_point = TRUE)

The default blue color can be changed with the one_color parameter

plot_step_ahead_model_output(projection_data_A_us, truth_data_us, 
                             intervals = 0.8, one_color = "orange",
                             ens_name = "hub-ensemble", ens_color = "black",
                             pal_color = NULL, use_median_as_point = TRUE)
  • Interactive/Static plot:
plot_step_ahead_model_output(projection_data_A_us, truth_data_us, 
                             interactive = FALSE)

  • Column Names:

The input data frames can have different column names for the date information. In this case, the two x_col_name and x_truth_col_name parameters can be used to indicate the variables that should be mapped to the x-axis.

names(truth_data_us)[names(truth_data_us) == "time_idx"] <- "time"
names(projection_data_A_us)[names(
  projection_data_A_us) == "target_date"] <- "date"
plot_step_ahead_model_output(projection_data_A_us, truth_data_us, 
                             x_col_name = "date", x_truth_col_name = "time")

For other parameters, please consult the documentation associated with the function:
?plot_step_ahead_model_output